SUBMITTED TO IEEE TRANSACTION ON INFORMATION THEORY 



Message-passing for Maximum Weight Independent 

Set 

Sujay Sanghavi Devavrat Shah Alan Willsky 



OO 

O 

o 



m 



< 

O 



0^ 
O 
in 

i> 

o 

OO 

o 



X 



Abstract — We investigate tiie use of message-passing algoritlims 
for the problem of finding tlie max-weight independent set 
(MWIS) in a graph. First, we study the performance of the 
classical loopy max-product belief propagation. We show that 
each fixed point estimate of max-product can be mapped in a 
natural way to an extreme point of the LP polytope associated 
with the MWIS problem. However, this extreme point may not be 
the one that maximizes the value of node weights; the particular 
extreme point at final convergence depends on the initialization of 
max-product. We then show that if max-product is started from 
the natural initialization of uninformative messages, it always 
solves the correct LP - if it converges. This result is obtained 
via a direct analysis of the iterative algorithm, and cannot be 
obtained by looking only at fixed points. 

The tightness of the LP relaxation is thus necessary for max- 
product optimality, but it is not sufficient. Motivated by this 
observation, we show that a simple modification of max-product 
becomes gradient descent on (a convexified version of) the dual 
of the LP, and converges to the dual optimum. We also develop 
a message-passing algorithm that recovers the primal MWIS 
solution from the output of the descent algorithm. We show 
that the MWIS estimate obtained using these two algorithms 
in conjunction is correct when the graph is bipartite and the 
MWIS is unique. 

Finally, we show that any problem of MAP estimation for 
probability distributions over finite domains can be reduced to 
an MWIS problem. We believe this reduction will yield new 
insights and algorithms for MAP estimation. 



I. Introduction 

The max-weight independent set (MWIS) problem is the 
following: given a graph with positive weights on the nodes, 
find the heaviest set of mutually non-adjacent nodes. MWIS 
is a well studied combinatorial optimization problem that 
naturally arises in many applications. It is known to be NP- 
hard, and hard to approximate [5]. In this paper we investigate 
the use of message-passing algorithms, like loopy max-product 
belief propagation, as practical solutions for the MWIS prob- 
lem. We now summarize our motivations for doing so, and 
then outline our contribution. 

Our primary motivation comes from applications. The 
MWIS problem arises naturally in many scenarios involving 
resource allocation in the presence of interference. It is often 
the case that large instances of the weighted independent 
set problem need to be (at least approximately) solved in 
a distributed manner using lightweight data structures. In 
Section III-AI we describe one such application: scheduling 
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channel access and transmissions in wireless networks. Mes- 
sage passing algorithms provide a promising alternative to 
current scheduling algorithms. 

Another, equally important, motivation is the potential 
for obtaining new insights into the performance of exist- 
ing message-passing algorithms, especially on loopy graphs. 
Tantalizing connections have been established between such 
algorithms and more traditional approaches like linear pro- 
gramming (see [1], [2] [8] and references therein). We consider 
MWIS problem to understand this connection as it provides 
a rich (it is NP-hard), yet relatively (analytically) tractable, 
framework to investigate such connections. 

A. Our contributions 

In Section HI] we formally describe the MWIS problem, 
formulate it as an integer progam, and present its natural LP 
relaxation. We also describe how the MWIS problem arises in 
wireless network scheduling. 

In Section [nil we first describe how we propose using 
max-product (as a heuristic) for solving the MWIS problem. 
Specifically, we construct a probability distribution whose 
MAP estimate is the MWIS of the given graph. Max-product, 
which is a heuristic for finding MAP estimates, emerges 
naturally from this construction. 

Max-product is an iterative algorithm, and is typically 
executed until it converges to a fixed point. In Section |IV] 
we show that fixed points always exist, and characterize their 
structure. Specifically, we show that there is a one-to-one map 
between estimates of fixed points, and extreme points of the 
independent set LP polytope. This polytope is defined only 
by the graph, and each of its extrema corresponds to the LP 
optimum for a different node weight function. This implies that 
max-product fixed points attempt to solve (the LP relaxation 
of) an MWIS problem on the correct graph, but with different 
(possibly incorrect) node weights. This stands in contrast to 
its performance for the weighted matching problem [1], [2], 
[9], for which it is known to always solve the LP with correct 
weights. 

Since max-product is a deterministic algorithm, the particu- 
lar fixed point (if any) that is reached depends on the initializa- 
tion. In Section[V]we pursue an alternative line of analysis, and 
directly investigate the performance of the iterative algorithm 
itself, started from the "natural" initialization of uninformative 
messages. Fot this case, we show that max-product estimates 
exactly correspond to the true LP, at all times - not just the 
fixed point. 

Max-product bears a striking semantic similarity to dual 
coordinate descent on the LP. With the intention of modifying 
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max-product to make it as powerful as LP, in Section |Vl] we 
develop two iterative message-passing algorithms. The first, 
obtained by a minor modification of max-product, approxi- 
mately calculates the optimal solution to the dual of the LP 
relaxation of the MWIS problem. It does this via coordinate 
descent on a convexified version of the dual. The second 
algorithm uses this approximate optimal dual to produce an 
estimate of the MWIS. This estimate is correct when the 
original graph is bipartite. We believe that this algorithm 
should be of broader interest. 

The above uses of max-product for MWIS involved posing 
the MWIS as a MAP estimation problem. In the final Section 
IVIII we do the reverse: we show how any MAP estimation 
problem on finite domains can be converted into a MWIS 
problem on a suitably constructed auxiliary graph. This implies 
that any algorithm for solving the independent set problem 
immediately yields an algorithm for MAP estimation. This 
reduction may prove useful from both practical and analytical 
perspectives. 

II. Max-weight Independent Set, and its LP 
Relaxation 

Consider a graph G — {V, E), with a set V of nodes and 
a set E of edges. Let J\f{i) = {j G V : {i,j) G E} be the 
neighbors of i ^V. Positive weights Wj,? G V^ are associated 
with each node. A subset of V will be represented by vector 
X = (xi) e {0, 1}'^', where Xi = 1 means i is in the subset 
Xi = means i is not in the subset. A subset x is called an 
independent set if no two nodes in the subset are connected by 
an edge: (xi, Xj) ^ (1, 1) for all (i, j) G E. We are interested 
in finding a maximum weight independent set (MWIS) x*. 
This can be naturally posed as an integer program, denoted 
below by IP. The linear programing relaxation of IP is 
obtained by replacing the integrality constraints Xi G {0,1} 
with the constraints xi > 0. We will denote the corresponding 
linear program by LP. The dual of LP is denoted below by 
DUAL. 

n 

IP : max \^ WiXi, 

4=1 

s.t. Xi + Xj < 1 for all (i, j) G E, 
X, G{0,1}. 

n 

LP : max \^ WiXi, 

i=l 

S.t. Xi + Xj < 1 for all (i, j) G E, 
X, > 0. 

DUAL: min ^ A^, 

S.t. 2_, ^ij — ^ii f°r ^11 i &V, 

Ay >0, for all [i,j) G E. 

It is well-known that LP can be solved efficiently, and if it 
has an integral optimal solution then this solution is an MWIS 



of G. If this is the case, we say that there is no integrality gap 
between LP and IP or equivalently that the LP relaxation is 
tight. 

Properties of the LP 

We now briefly state some of the well-known properties of 
the MWIS LP, as these will be used/referred to in the paper. 
The polytope of the LP is the set of feasible points for the 
linear program. An extreme point of the polytope is one that 
cannot be expressed as a convex combination of other points 
in the polytope. 

Lemma 2.1: ( [12], Theorem 64.7) The LP polytope has 
the following properties 

1) For any graph, the MWIS LP polytope is half- integral: 
any extreme point will have each .t^ = 0, 1 or i. 

2) For bipartite graps the LP polytope is integral: each 
extreme point will have x^ = or 1. 

Half-integrality is an intriguing property that holds for LP 
relaxations of a few combinatorial problems (e.g. vertex cover, 
matchings etc.). Half integrality implies that any extremum 
optimum of LP will have some nodes set to 1, and all their 
neighbors set to 0. The nodes set to ^ will appear in clusters: 
each such node will have at least one other neighbor also set 
to i. We will see later that a similar structure arises in max- 
product fixed points. 

Lemma 2.2: ( [12], Corollary 64.9a) LP optima are par- 
tially correct: for any graph, any LP optimum x* and any 
node i, if the mass x* is integral then there exists an MWIS 
for which that node's membership is given by x*. 

The next lemma states the standard complimentary slack- 
ness conditions of linear programming, specialized for the 
MWIS LP, and for the case when there is no integrality gap. 

Lemma 2.3: When there is no integrality gap between 
IP and LP, there exists a pair of optimal solutions x = {xi), 
A = (Ay) of LP and DUAL respectively, such that: (a) 



XG {0,1}", (b)x, (E 

(c) {xi + ; 



^jeAf(i 



A, 



w, 



for all i G V, 



l)A,y =0, for all {i,j) G E. 



A. Sample Application: Scheduling in Wireless Networks 

We now briefly describe an important application that 
requires an efficient, distributed solution to the MWIS prob- 
lem: transmision scheduling in wireless networks that lack a 
centralized infrastructure, and where nodes can only commu- 
nicate with local neighbors (e.g. see [15]). Such networks are 
ubiquitous in the modern world: examples range from sensor 
networks that lack wired connections to the fusion center, and 
ad-hoc networks that can be quickly deployed in areas without 
coverage, to the 802.11 wi-fi networks that currently represent 
the most widely used method for wireless data access. 

Fundamentally, any two wireless nodes that transmit at the 
same time and over the same frequencies will interfere with 
each other, if they are located close by. Interference means 
that the intended receivers will not be able to decode the 
transmissions. Typically in a network only certain pairs of 
nodes interfere. The scheduling problem is to decide which 
nodes should transmit at a given time over a given frequency, 
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+a;j<i} ncxp(u;ia;i), (1) 



so that (a) there is no interference, and (b) nodes which 
have a large amount of data to send are given priority. In 
particular, it is well known that if each node is given a weight 
equal to the data it has to transmit, optimal network operation 
demands scheduling the set of nodes with highest total weight. 
If a " conflict graph" is made, with an edge between every 
pair of interfering nodes, the scheduling problem is exactly 
the problem of finding the MWIS of the conflict graph. 
The lack of an infrastructure, the fact that nodes often have 
limited capabilities, and the local nature of communication, 
all necessitate a lightweight distributed algorithm for solving 
tiie MWIS problem. 

III. Max-product for MWIS 

The classical max-product algorithm is a heuristic that 
can be used to find the MAP assignment of a probability 
distribution. Now, given an MWIS problem on G = {V, E), 
associate a binary random variable Xi with each i ^ V and 
consider the following joint distribution: for x G {0, 1}", 

p(x) = I n i{ 

where Z is the normalization constant. In the above, 1 is 
the standard indicator function: Itme = 1 and Ifaise = 0. 
It is easy to see that p(x) = i exp (^^ micc^) if x is an 
independent set, and p{x) = otherwise. Thus, any MAP 
estimate argmaxxj5(x) corresponds to a maximum weight 
independent set of G. 

The update equations for max-product can be derived in 
a standard and straightforward fashion from the probability 
distribution. We now describe the max-product algorithm as 
derived from p. At every iteration t each node i sends a 
message {"t-*^j(0), m*^^(l)} to each neighbor j e M{i). 
Each node also maintains a belief {6*(0), 6*(1)} vector. The 
message and belief updates, as well as the final output, are 
computed as follows. 

Max-product for MWIS 

(o) Initially, m°^j{0) = m°_^j(l) = 1 for ah {i,j) e E. 
(i) The messages are updated as follows: 

m*i;(0) = max J [] m^W , 

k^j,k£j\f{i) 

(ii) Nodes i ^ V, compute their beliefs as follows: 
bliO) = n "^fc-.(0)> 

6*(1) = e-' n "^U(l)- 

keJ\f{i) 



(iii) Estimate max. wt. independent set x(6*+^) as follows: 

x.(6*)-l if ^^■(l)>^-(0) 

xM)=0 bl{l)<bm 

x,(6*)=? blil) = bm 

(iv) Update t = t + 1; repeat from (i) till x(6*) converges and 
output the converged estimate. 

For the purpose of analysis, we find it convenient to 
transform the messages and their dynamics as follows. First, 
define 



lU 



l-fj 



lOE 



..(0) 



^u,(l)^ 



Here, since the algorithm starts with all messages being strictly 
positive, the messages will remain strictly positive over any 
finite number of iterations. Therefore, taking logarithm is a 
valid operation. With this new definition, step (i) of the max- 
product becomes 



% 



t+i 



keMii)-] 



(2) 



W^> }_^ ll^. 


(3) 


k£M(i) 




m< J2 ^fc-^ 


(4) 


keATii) 




w,= Y. ^fc-' 


(5) 


fe6^(i) 





where we use the notation {x)+ = max{x,0}. The final 
estimation step (iii) of max-product takes the following form: 

x^{i') = 1 

x^ii') = 

x^{i') =? 

This modification of max-product is often known as the "min- 
sum" algorithm, and is just a reformulation of the max- 
product. In the rest of the paper we refer to this as simply 
the max-product algorithm. 

IV. Fixed Points of Max-product 

When applied to general graphs, max product may either 
(a) not converge, (b) converge, and yield the correct answer, 
or (c) converge but yield an incorrect answer Characterizing 
when each of the three situations can occur is a challenging 
and important task. One approach to this task has been to look 
directly at the fixed points, if any, of the iterative procedure 
(see e.g. [7]). In this section we investigate properties of 
fixed points, by formally establishing a connection to the 
LP polytope. 

Note that a set of messages 7* is a fixed point of max- 
product if, for all (i, j) £ E 



keM(i)-] 



(6) 



The following lemma establishes that fixed points always exist. 
Lemma 4.1: There exists at least one fixed point 7* such 
that 7*^^ G [0,Wi] for each (i,j) G E 
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Proof: Let w* — maxi Wi, and suppose at time t each 
ll^j G [0,w*]. From ^ it is clear that this will result in 
the messages 7*+^ at the next time also having each 7*^,, G 
[0, w*]. Thus, the max-product update rule (|2]i maps a message 
vector 7* e [0, w*]^!-^' into another vector in [0, w*]^'^'. Also, 
it is easy to see that (|2]i is a continuous function. Therefore, 
by Brouwer's fixed point theorem there exists a fixed point 

7* e [0,w*]2l^l. ■ 

We now study properties of the fixed points in order to 
understand the correctness of the estimate output by max- 
product. The following theorem characterizes the structure of 
estimates at fixed-points. Recall that the estimate Xi{'^*) for 
node i can be 0,1 or ?. 

Theorem 4.1: Let 7* be a fixed point, and let x(7*) = 
(2^1(7*)) be the corresponding estimate. Then, 

1) If a;,;(7*) = 1 then every neighbor j g M{i) has 
a:,(7*)=0. 

2) If cci(7*) = then at least one neighbor j G M{i) has 
x,(7*) = L 

3) If Xi{'-)*) =? then at least one neighbor j G A/'(i) has 

^Ai*) -?■ 

Before proving Theorem 14.11 we discuss its implications. 
Recall from Lemma 12.11 that every extreme point of the 
LP polytope consists of each node having a value of 0,1 
or i. If all weights are positive, the optimum of LP will have 
the following characteristics: every node with value 1 will be 
surrounded by nodes with value 0, every node with value will 
have at least one neighbor with value 1, and every node with 
value i will have one neighbor with value i. These properties 
bear a remarkable similarity to those in Theorem 14.11 Indeed, 
given a fixed point 7* and its estimates x(7*), make a vector 
y by setting 

yi = \ if estimate for i is Xi{'^*) ~1 
Vt = 1 x^ij*) = 1 

y. = x,{r) = 

Then, Theorem 14.11 implies that y will be an extreme point 

of the LP polytope, and also one that maximizes some weight 

function consisting of positive node weights. Note however 

that this may not be the true weights Wi. In other words, 

given any MWIS problem with graph G and weights w, each 

max-product fixed point represents the optimum of the LP 

relaxation of some MWIS problem on the same graph G, but 

possibly with different weights w. 

The fact that max-product estimates optimize a different 
weight function means that both eventualities are possible: 
LP giving the correct answer but max-product failing, and 
vice versa. We now provide simple examples for each one of 
these situations. 

The Figures HV] and |lV]present graphs and the corresponding 
fixed points of max-product. In each graph, numbers represent 
node weights, and an arrow from i to j represents a message 
value of 7*U, = 2. All other messages, which do not have 
arrows, have value zero. The boxed nodes indicate the ones 
for which the estimate 2:4(7*) = 1. It is easy to verify that 
both examples represent max-product fixed points. 

For the graph in Figure IIVI the max-product fixed point 
results in an incorrect estimate. However, the graph is bipartite, 
and hence LP will provide the correct answer. For the graph 




Fig. 1. This example shows that max-product fixed point may result in- 
correct answer even though LP is tight. 




Fig. 2. This example shows that max-product fixed point can find right 
MWIS even though LP relaxation is not tight. 



in Figure |IV] there is an integrality gap between LP and 
IP: setting each Xi ~ ^ yields an optimal value of 7.5 for 
LP, while the optimal solution to IP has value 6. Note that 
the estimate at the fixed point of max-product is the correct 
MWIS. It is also worth noticing that both of these examples, 
the fixed points lie in the strict interiors of a non-trivial region 
of attraction: starting the iterative procedure from within these 
regions will result in convergence to the corresponding fixed 
point. These examples indicate that it may not be possible to 
resolve the question of relative strength of the two procedures 
based solely on an analysis of the fixed points of max-product. 
The particular fixed point, if any, that max-product con- 
verges to depends on the initialization of the messages; each 
fixed point will have its own region of convergence. In Section 
W\ we directly analyze the iterative algorithm when started 
from the "natural" initialization of unbiased messages. As 
a byproduct of this analysis, we prove that if max-product 
from this initialization converges, then the resulting fixed-point 
estimate is the optimum of LP; thus, in this case the max- 
product fixed point solves the "correct" LP. 

Proof of Theorem \4.1\ The proof of Theorem 14.11 
follows from manipulations of the fixed point equations (|6]l. 
For ease of notation we replace 7* by 7. We first prove 
the following statements on how the estimates determine the 
relative ordering of the two messages (one in each direction) 
on any given edge: 



Xt{l) = 1 => 7»-j > l]^i 


v.? e M{i) 


(7) 


•^1(7)=? => li^j^lj^i 


Vj e N{i) 


(8) 



The above equations cover every case except for edges be- 
tween two nodes with estimates. This is covered by the 
following 

a;j(7) = and .Tj(7) = ^ 7»-j = 7j^< = (9) 

Suppose first that i is such that Xi(7*) = L By definition 
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([SJl of the fixed point, 

7j_»j > Wi - ^ 7fc_,i 

However, by ([3]l, the fact that Xi{'j) ~ 1 impHes that 

k£Mii)-j 

Putting the above two equations together proves O. The proof 
of ^ is along similar lines. Suppose now i is such that 
2:^(7) =?. By ^ this impHes that w., = X]feeA^(j) 'yk~>i, and 
so from ^ we have that 



7i 



i^j 



keAf(i)-j 



Also, the fact that Xi{'^) =? means that 

Wi- ^ 7fc^j = jj^i 

kGM(i)-3 

Putting the above two equations together proves (|8]l. We now 
prove the three parts of Theorem 14.11 

Proof of Part 1): Let i have estimate Xi{'-f) — 1, and suppose 
there exists a neighbor j e A/'(i) such that Xj{'y) =? or 
1. Then, from d?) it follows that %^j > Jj^i, and from 
dSJ it further follows that ji^j < lj->i- However, this is a 
contradiction, and thus every neighbor of i has to have estimate 
0. 

Proof of Part 2): Let i have estimate Xi (7) = 0. Since 
Wi > 0, ^ implies that there exists at least one neighbor 
j e A/'(i) such that the message jj^i > 0. From (|9]l, this 
means that the estimate Xj{'^) cannot be 0. Suppose now that 
Xj{'j) =?. From dTji it follows that %-,j = ^j-,i > 0, and so 



It- 



VlI, 



Y. ^^' 

k£Af(i)-j 



"fj^i, this means that 



However, since ji^j 

7j_i = Wj - ^ 7fc_>i 
keM(i)^j 

which violates (HJi, and thus the assumption that 2:^(7) = 0. 
Thus it has to be that Xi{'y) = 1. 

Proof of Part 3): Let i have estimate 2:^(7) =?. Since Wi > 
0, (|5]l implies that there exists at least one neighbor j e J\f{i) 
such that the message 7j^i > 0. From (|8]l it follows that 



It^j = I'j^t = 






Thus Wj = J2i li->j^ which by (|5]l means that Xj (7) =?. Thus 
i has at least one neighbor j with estimate Xj (7) =?. ■ 

V. Direct Analysis of the Iterative Algorithm 

In the last section, we saw that fixed points of Max-product 
may correspond to optima "wrong" linear programs; ones that 
operate on the same feasible set as LP, but optimize a different 
linear function. However, there will also be fixed points that 
correspond to optimizing the correct function. Max-product is 
a deterministic algorithm, and so which of these fixed points 



(if any) are reached is determined by the initialization. In 
this section we directly analyze the iterative algorithm itself, 
as started from the "natural" initialization 7 = 0, which 
corresponds to uninformative messages 

We show that the resulting estimates are characterized by 
optima of the true LP, at every time instant (not just at fixed 
points). This imphes that, if a fixed point is reached, it will 
exactly reflect an optimum of LP. Our main theorem in this 
section is stated below. 

Theorem 5.1: Given any MWIS problem on weighted graph 
G, suppose max-product is started from the initial condition 
7 = 0. Then, for any node i G G. 

1) If there exists any optimum x* of LP for which the 
mass assigned to i satisfies x* < 1, then the max-product 
estimate Xi{j*) is or ? for all even times t. 

2) If there exists any optimum x* of LP for which the 
mass assigned to edge i satisfies x* > 0, then the max- 
product estimate 2:^(7*) is 1 or ? for all odd times t. 

From the above theorem, it is easy to see what will happen if 
LP has non-integral optima. Suppose node i is assigned non- 
integral mass at some LP optimum 2:*. This implies that i and 
X* will satisfy both parts of the above theorem. The estimate 
at node i will thus either keep varying every alternate time 
slot, or will converge to ?. Either way, max-product will fail 
to provide a useful estimate for node i. 

Theorem 15.11 also reveals further insights into the max- 
product estimates. Suppose for example the estimates converge 
to informative answers for a subset of the nodes. Theorem 
15. II implies that every LP optimum assigns the same integral 
mass to any fixed node in this subset, and that the converged 
estimate is the same as this mass. 

The proof of this theorem relies on the computation tree 
interpretation of max-product estimates. We now specify this 
interpretation for our problem, and then prove Theorem 15.11 

Computation Tree for MWIS 

The proof of Theorem 15.11 relies on the computation tree 
interpretation [19], [22] of the loopy max-product estimates. 
In this section we briefly outline this interpretation. For any 
node i, the computation tree at time t, denoted by Ti{t), is 
defined recursively as follows: Ti{l) is just the node i. This is 
the root of the tree, and in this case is also its only leaf. The 
tree Ti{t) at time t is generated from Ti{t — 1) by adding to 
each leaf of Ti{t — \) a copy of each of its neighbors in G, 
except for the one neighbor that is already present in Ti[t—\). 
Each node in Ti is a copy of a node in G, and the weights of 
the nodes in Ti are the same as the corresponding nodes in G. 
The computation tree interpretation is stated in the following 
lemma. 

Lemma 5.1: For any node i at time t, 

• a;i(7*) = 1 if and only if the root of Ti{i:) is a member 
of every MWIS on Tj(t). 

• a;i (7*) = if and only if the root of Xi (t) is not a member 
of any MWIS on T,{t). 

• Xi{"f*) =? else. 

Thus the max-product estimates correspond to max-weight 
independent sets on the computation trees Ti{t), as opposed 
to on the original graph G. 
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Example: Consider figure |V] On the left is the original 
loopy graph G. On the right is Ta{4:), the computation tree 
for node a at time 4. 




Proof of Theorem 15.71 

We now prove Theorem 15.11 For brevity, in this proof we 
will use the notation xj = a;i(7*) for the estimates. Suppose 
now that part 1 of the theorem is not true, i.e. there exists node 
i, an optimum x* of LP with x* > 0, and an odd time t at 
which the estimate is xj = 0. Let Ti{t) be the corresponding 
computation tree. Using Lemma |5?T] this means that the root i 
is not a member of any MWIS of Ti{t). Let / be some MWIS 
on Ti{t). We now define the following set of nodes 

/* = {j e Ti{t) ■■ i ^ I, and copy of j in G has x* > O} 

In words, /* is the set of nodes in Ti{t) which are not in /, 
and whose copies in G are assigned strictly positive mass by 
the LP optimum x* . 

Note that by assumption the root i E I* and i ^ I. Now, 
from the root, recursively build a maximal alternating subtree 
S as follows: first add root i, which is in /* — /. Then add all 
neighbors of i that are in I — I*. Then add all their neighbors 
in /* — /, and so on. The building of S stops either when it 
hits the bottom level of the tree, or when no more nodes can 
be added while still maintaining the alternating structure. Note 
the following properties of 5': 

• 5 is the disjoint union of {S n /) and (5* n /*). 

• For every j G S H I, all its neighbors in /* are included 
in Sni*. Similarly for every j e SCM*, all its neighbors 
in / are included in S O I. 

• Any edge (j, fc) in Ti{t) has at most one endpoint in 
{S n /), and at most one in (5" n /*). 

We now state a lemma, which we will prove later. The proof 
uses the fact that t is odd. 

Lemma 5.2: The weights satisfy w{S n I) < w{S n /*). 

We now use this lemma to prove the theorem. Consider the 
set /' which changes / by flipping 5": 

/' = /-(S'n/) + (5nr) 

We first show that /' is also an independent set on Ti{t). This 
means that we need to show that every edge {j,k) in Ti{t) 
touches at most one node in /'. There are thus three possible 
scenarios for edge {j, k): 

• j,k ^ S. In this case, membership of j, k in /' is the 
same as in I, which is an independent set. So (j, k) has 
at most one node touching /'. 

• One node j G SCiI. In this case, j ^ /', and hence again 
at most one of j, k belongs to /'. 



• One node k G SCiI* but other node j ^ SCiI. This means 
that j ^ /, because every neighbor of k in / should be 
included in SCiI. This means that j ^ /', and hence only 
node k G /' for edge (j, k). 

Thus /' is an independent set on Ti{t). Also, by Lemma \52\ 
we have that 

w{r) > w{i) 

However, / is an MWIS, and hence it follows that /' is also 
an MWIS of Ti(t). However, by construction, root i G I', 
which violates the fact that Xi{t) = 0. The contradiction is 
thus established, and Part 1 of the theorem is proved. Part 2 
is proved in a similar fashion. ■ 

Proof of Lemma 15.21 

The proof of this lemma involves a perturbation argument 
on the LP. For each node j G G, let nij denote the number 
of times j appears in S* n / and rij the number of times it 
appears in S D I*. Define 



X* + e{m — n) 



(10) 



We now show state a lemma that is proved immediately 
following this one. 

Lemma 5.3: a; is a feasible point for LP, for small enough 



We now use this lemma to finish the proof of Lemma [ 
Since x* is an optimum of LP, it follows that w'x < w'x*, 
and so w'm < w'n. However, by definition, w'm = w{Sr\I) 
and w'n = w{S C\ I*). This finishes the proof. ■ 

Proof of Lemma 15.51 

We now show that this x as defined in (fTOl i is a feasible 
point for LP, for small enough e. To do so we have to check 
node constraints Xj > and edge constraints Xj + x^ < 1 
for every edge (j, k) G G. Consider first the node constraints. 
Clearly we only need to check them for any j which has a 
copy j E I* n S. If this is so, then by the definition (|V]i of 
/*, X* > 0. Thus, for any nij and Uj, making e small enough 
can ensure that x* + e{mj — rij) > 0. 

Before we proceed to checking the edge constraints, we 
make two observations. Note that for any node j in the tree, 
j G 5 n / then 

• a;* < 1, i.e. the mass x* put on j by the LP optimum x* 
is strictly less than 1 . This is because of the alternating 
way in which the tree is constructed: a node j in the tree 
is included in S* n / only if the parent p of j is in S* n /* 
(note that the root i G 5 n /* by assumption). However, 
from the definition of /*, this means that x* > 0, i.e. the 
parent has positive mass at the LP optimum x* . This 
means that x* < 1, as having x* = 1 would mean that 
the edge constraint a;* + a;* < 1 is violated. 

« j is not a leaf of the tree. This is because S alternates 
between / and /*, and starts with /* at the root in level 
1 (which is odd). Hence S f) I will occupy even levels 
of the tree, but the tree has odd depth (by assumption t 
is odd). 

Now consider the edge constraints. For any edge (j, fc), if the 
LP optimum x* is such that the constraint is loose - i.e. if 
x* + a;^ < 1 - then making e small enough will ensure that 
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- Xk < 1. So we only need to check the edge constraints and updated 



which are tight aX x* . 

For edges with x* + x^ = 1, every time any copy of one 
of the nodes j or k is included in S Ci I, the other node is 
included in S D I*. This is because of the following: if j is 
included in S f) I, and k is its parent, we are done since this 
means k ^ S f) I* . So suppose k is not the parent of j. From 
the above it follows that j is not a leaf of the tree, and hence 
k will be one of its children. Also, from above, the mass on 
j satisfies x* < 1. However, by assumption x* + x^ ~ 1, and 
hence the mass on k is x^ > 0. This means that the child k 
has to be included in S I* . 

It is now easy to see that the edge constraints are satisfied: 
for every edge constraint which is tight at x*, every time the 
mass on one of the endpoints is increased by e (because of 
that node appearing in 5 fl /), the mass on the other endpoint 
is decreased by e (because it appears S D I*). ■ 

VI. A Convergent message-passing algorithm 

In Section |V] we saw that max-product started from the 
natural initial condition solves the correct LP at the fixed 
point, if it converges. However, convergence is not guaranteed, 
indeed it is quite easy to construct examples where it will not 
converge. In this section we present a convergent message- 
passing algorithm for finding the MWIS of a graph. It is based 
on modifying max-product by drawing upon a dual co-ordinate 
descent and the barrier method. The algorithm retains the 
iterative and distributed nature of max-product. The algorithm 
operates in two steps, as described below. 

ALGO(e, S, 5i) 

(o) Given an MWIS problem, and (small enough) positive 
parameters e, S, run sub-routine DESCENT(£, S) to ob- 
tain an output A^''^ ~ (A^.- ){ij)£E ^"^'^ is an approximate 
dual of the MWIS problem. 

(i) Next, using (small enough) 5i > 0, use EST(A^''', Si), to 
produce an estimate for the MWIS as an output of the 
algorithm. 






max < 0, 



E 
- E 




(11) 



The A on all the other edges remain unchanged from t to 
t+1. Notice the similarity (at least syntactic) between standard 
dual coordinate descent (fTTI) and max-product ^. In essence, 
the dual coordinate descent can be thought of as a sequential 
bidirectional version of the max-product algorithm. 

Since, the dual coordinate descent algorithm is designed so 
that at each iteration, the cost of the DUAL is non-increasing, it 
always converges in terms of the cost. However, the converged 
solution may not be optimum because DUAL contains the 
"non-box" constraints J^jeJVii) ^v — ^i- Therefore, a direct 
usage of dual coordinate descent is not sufficient. In order 
to make the algorithm convergent with minimal modification 
while retaining its iterative message-passing nature, we use 
barrier (penalty) function based approach. With an appropriate 
choice of barrier and using result of Luo and Tseng [3], we 
will find the new algorithm to be convergent. 

To this end, consider the following convex optimization 
problem obtained from DUAL by adding a logarithmic barrier 
for constraint violations with e > controlling penalty due to 
violation. Define 



5(^,A) 




!log 



E ^. 

jeA/'(») 



Then, the modified DUAL optimization problem becomes 

CP(e) : min g{e, A) 
subject to Xij > 0, for all {i,j) G E. 

The algorithm DESCENT(e, S) is coordinate descent on 
CP(e), to within tolerance 6, implemented via passing mes- 
sages between nodes. We describe it in detail as follows. 

DESCENT(e, S) 



Next, we describe DESCENT and EST, state their prop- 
erties and then combine them to produce the following result 
about the convergence, correctness and bound on convergence 
time for the overall algorithm. 



A. DESCENT.- algorithm 

Here, we describe the DESCENT algorithm. It is in- 
fluenced by the max-product and dual coordinate descent 
algorithm for DUAL. First, consider the standard coordi- 
nate descent algorithm for DUAL. It operates with variables 
{Ay, {i,j) e E} (with notation Ay = Xji). It is an iterative 
procedure; in each iteration t one edge {i,j) G E is pickeq^ 

'Edges can be picked either in round-robin fashion, or uniformly at random. 



(o) The parameters are variables Ay, one for each edge 

(i,j) e E. We will use notation that A*, = A*j. 

The vector A is iteratively updated, with t denoting the 

iteration number 

o Initially, set < = and A° — ma.x{wi,Wj} for all 

{i,j)eE. 

(i) In iteration t + I, update parameters as follows: 

o Pick an edge (i,j) € E. The edge selection is done 
in a round-robin manner over all edges. 

o For all {i',j') G E, {i',j') ^ {hj) do nothing, i.e. 

o For edge {i,j), nodes i and j exchange messages as 
follows: 



h—>j 



E 

k^j.keM(i) 



^ki 
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^jt\=\w, 



k'^l,k'<£j\f{j} 



o Update A*^^ as follows; with a — "/l^) and b 



Then, there exists a unique limit point A'^'' such that 

IIA'-A^'-^II < Aexp{-Bt), (13) 

for some positive constant A, B (which may depend on prob- 
lem parameter, e and 6). Let A*^ be the solution of CP(£). Then, 



(12) 

(ii) Update t = t + 1 and repeat till algorithm converges 

within 5 for each component, 
(iii) Output the vector A, denoted by A^ *, when the algorithm 

stops. 

Remark. The updates in DESCENT above are obtained 
by small - but important - perturbation of standard dual 
coordinate descent ( fTTT ). To see this, consider the iterative step 
in (fT2l) . First, note that 



a + b + 2e+ ^{a-by + 'ie'^ a + b + 2e+ ^{a - bf 

2 ^ 2 

_ a + b+\a-b\ + 2e 

2 
= max(a, b) + e. 

Similarly, 

a + b + 2e+ ^J{a-bY + ^6"^ 



< 



a + b + 2e+ ^(a - &)2 + 4£(a - b) + ie'^ 

2 

a + b+\a-b\+4e 



= max(a, b) + 2e. 
Therefore, we conclude that ( fT2] i can be re-written as 



a: 



t+i 



/3e- 



-Pe, 



E 

keAr(i)\j 



A*. 



'.- E ^. 

keAfij)V 

where for some (3 G (1,2] with its precise value dependent 
on 7*i^-,7*i,^j- This small perturbation takes A close to the 
true dual optimum. In practice, we believe that instead of 
calculating exact value of /3, use of some arbitrary /3 G (1, 2] 
should be sufficient. 



B. DESCENT.- properties 

The DESCENT algorithm finds a good approximation to 
an optimum of DUAL, for small enough e, S. Furthermore, 
it always converges, and does so quickly. The following 
lemma specifies the convergence and correctness guarantees 
of DESCENT. 

Lemma 6.1: For given e,6 > 0, let A* be the parameter 
value at the end of iteration t > 1 under DESCENT(£,(5). 



lim A--^ 

6^0 



X" 



Further, by taking e ^ 0, A^ goes to A*, an optimal solution 
to the DUAL. 

We first discuss the proofs of two facts in Lemma 16.11 (a) 
lima^^o A^''' = A^ is a direct consequence of the fact that if 
we ran DESCENT algorithm with 5 = 0, it converges; (b) 
the fact that as £ ^ 0, A*^ goes to a dual optimal solution A* 
follows from [13, Prop. 4.1.1]. Now, it remains to establish 
the convergence of the DESCENT(£,(5) algorithm. This will 
follow as a corollary of result by Luo and Tseng [3]. In order 
to state the result in [3], some notation needs to be introduced 
as follows. 

Consider a real valued function (h : K" -^ M defined as 



(z) = ip{Ez) +y^^WiXi, 



where E G jj^x" is an to x n matrix with no zero column 
(i.e., all coordinates of z are useful), w = (wi) G M" is a 
given fixed vector, and ip : R™ ^ R is a strongly convex 
function on its domain 

We have _D^, being open and let dD^ denote its boundary. We 
also have that, along any sequence y^ such that yk — > dD^, 
(i.e., approaches boundary of D^), V'(y'^) ^^ oo- The goal is 
to solve the optimization problem 



mmirmze 
over 



0(z) 

zex. 



(14) 



In the above, we assume that X is box-type, i.e., 



^ = I[l 



,Ui e 



Let X* be the set of all optimal solutions of the problem 
(fl4l i. The "round-robin" or "cyclic" coordinate descent algo- 
rithm (the one used in DESCENT) for this problem has the 
following convergence property, as proved in Theorem 6.2 [3]. 

Lemma 6.2: There exist constants a' and /3' which may 
depend on the problem parameters in terms of g,E,w such 
that starting from the initial value z", we have in iteration t 
of the algorithm 

d{z\X*) < a' exp {-(3't) <i(z°, X*). 

Here, d{-,X*) denotes distance to the optimal set X* . 
Proof of Lemma 16.71 



It suffices to check that the 
conditions assumed in the statement of Lemma 16.21 apply in 
our set up of Lemma 16.11 in order to complete the proof. 

Note first that the constraints Ay > in CP(£) are of 
"box-type", as required by Lemma |6^ Now, we need to show 
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that g{-) satisfies the conditions that (/>{■) satisfied in ( fT4b . 
By observation, we see that the linear part in g(-) is ^- Ay 
corresponds to the linear part in <j>. Now, the other part in g{-), 
which corresponds to h{e, A) where define 



hie,X) = -eY.log{ Y. A,j 



W, 



By definition, the h{-) is strictly convex on its domain which 
is an open set as for any i, if 



E 



\j i Wi, 



then h{-) ] oo. Note that for h{-) -^ oo towards boundary 
corresponding to ||A|[ -^ oo can be adjusted by redefining h{-) 
to include some parts of the linear term in g(). Finally, the 
condition corresponding to E not having any zero column in 
(fl4] i follows for any connected graph, which is of our interest 
here. Thus, we have verified conditions of Lemma 16.21 and 
hence established the proof of ( fT3T l. This completes the proof 
of Lemma 16.11 ■ 



C. EST.- algorithm 

The algorithm DESCENT yields a good approximation of 
the optimal solution to DUAL, for small values of e and S. 
However, our interest is in the (integral) optimum of LP, when 
it exists. There is no general procedure to recover an optimum 
of a linear program from an optimum of its dual. However, we 
show that such a recovery is possible through our algorithm, 
called EST and presented below, for the MWIS problem 
when G is bipartite with a unique MWIS. This procedure is 
likely to extend for general G when LP relaxation is tight and 
LP has a unique solution. In the following 6i is chosen to be 
an appropriately small number, and A is expected to be (close 
to) a dual optimum. 

EST(A,<5i). 

(o) The algorithm iteratively estimates x = (xi) given A 

(expected to be a dual optimum), 
(i) Initially, color a node i gray and set Xi = if 

J2iej\f(i) ^ij > ''^i + ^1- Color all other nodes with green 

and leave their values unspecified. 
(ii) Repeat the following steps (in any order) until no more 

changes can happen: 

o if i is green and there exists a gray node j £ A/'(i) 
with Xij > Si, then set Xi ~ 1 and color it orange. 
o if I is green and some orange node j G A/'(i), then 
set Xi ~ and color it gray. 
(iii) If any node is green, say i, set Xi = 1 and color it red. 
(iv) Produce the output x as an estimation. 



D. EST.- properties 

Lemma 6.3: Let A* be an optimal solution of DUAL. If G is 
a bipartite graph with unique MWIS, then the output produced 
by EST(A*,0) is the maximum weight independent set of G. 
Proof: 



Let X be output of EST(A*, 0), and x* the unique optimal 
MWIS. To establish x = x*, it is sufficient to establish 
that X and A* together satisfy the complimentary slackness 
conditions stated in Lemma |231 namely 

(xl) Xi{J2jemr) Kj - Wi) = for all i e V, 
(x2) (x, + Xj - 1)A*^. = for all (ij) £ E, and 
(x3) X is a feasible solution for the IP. 

From the way the color gray is assigned initially, it follows 
that either Xi = or ^ A^ — Wi ~ for all nodes i. Thus 
(xl) is satisfied. 

Before proceeding we note that all nodes initially colored 
gray are correct, i.e. Xi = x* ~ 0; this is because the optimal 
X* satisfies (xl). Now consider any node j that is colored 
orange due to there being a neighbor i that is one of the initial 
grays, and A^ > 0. For this node we have that Xj = x* = 1, 
because x* satisfies (x2). Proceeding in this fashion, it is easy 
to establish that all nodes colored gray or orange are assigned 
values consistent with the actual MWIS x*. 

Now to prove (x2); consider a particular edge {i,j). For 
this, if A*, = then the (x2) is satisfied. So suppose A*,- > 0, 
but Xi + Xj y^ 1 . This will happen if both Xi = Xj = 0, or both 
are equal to 1 . Now, both are equal to only if they are both 
colored gray, in which case we know that the actual optima 
X* = X* = 1 as well. But this means that (x2) is violated by 
the true optimum x*, which is a contradiction. Thus it has to 
be that Xi = Xj = 1 for violation to occur. However, this is 
also a violation of (x3), namely the feasibility of x for the IP. 
Thus all that remains to be done is to establish (x3). 

Assume now that (x3) is violated, i.e. there exists a subset 
E' of the edges whose both endpoints are set to 1. Let 
S*! C Vi, 5*2 C V2 be these endpoints. Note that, by assump- 
tion. Si y^ 0, 5*2 7^ 0- We now use Si and S2 to construct 
two distinct optima of IP, which will be a violation of our 
assumption of uniqueness of the MWIS. The two optima, 
denoted x and x, are obtained as follows: in x, modify xi ~ 
for all i E Si to obtain a;; in x modify Xi = for all i £ S2 
to obtain x. We now show that both x and x satisfy all three 
conditions (xl), (x2) and (x3). 

Recall that the nodes in 5i and 5*2 must have been colored 
red by the algorithm EST. Now, we establish optimality of x 
and X. By construction, both a; and x satisfy (xl) since we 
have only changed assignment of red nodes which were not 
binding for constraint (xl). 

Now, we turn our attention towards (x2) and (x3) for x 
and X. Again, both solutions satisfy (x2) and (x3) along edges 
(i, j) e E such that i G ^i , j G 6*2 or else they would not have 
been colored red. By construction, they satisfy (x3) along all 
other edges as well. Now we show that x, x satisfy (x2) along 
edges {i,j) G E, such that i G Si,j ^ S2 or i ^ Si,j G S2. 
For this, we claim that all such edges must have A* = 0: if 
not, that is A*, > 0, then either i or 7 must have been colored 
orange and an orange node can not be part of Si or S2. Thus, 
we have established that both x and x along with A* satisfy 
(xl), (x2) and (x3). The contradiction is thus established. 

Thus, we have established that x along with A* satisfies 
(xl), (x2) and (x3). Therefore, x is the optimal solution of 
LP, and hence of the IP. This completes the proof. ■ 
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Now, consider a version of EST where we check for 
updating nodes in a round-robin manner. That is, in an iteration 
we peform 0{n) operations. Now, we state a simple bound on 
running time of EST. 

Lemma 6.4: The algorithm EST stops after at most 0{n) 
iterations. 

Proof: The algorithm stops after the iteration in which no 
more node's status is updated. Since each node can be updated 
at most once, with the above stopping condition an algorithm 
can run for at most 0{n) iterations. This completes the proof 
of Lemma 16.41 ■ 



E. Overall algorithm: convergence and correctness 

Before stating convergence, correctness and bound on con- 
vergence time of the ALGO(e, 5, 5i) algorithm, a few re- 
marks are in order We first note that both DESCENT and 
EST are iterative message-passing procedures. Second, when 
the MWIS is unique, DESCENT need not produce an exact 
dual optimum for EST to obtain the correct answer Finally, it 
is important to note that the above algorithm always converges 
quickly, but may not produce good estimate when LP relax- 
ation is not tight. Next, we state the precise statement of this 
result. 

Theorem 6.1 (Convergence & Correctness): The algorithm 
ALG0{£,S,6i) converges for any choice of e,5 > and for 
any G. The solution obtained by it is correct if G is bipartite, 
LP has unique solution and e,S > 0, Si are small enough. 

Proof: The claim that algorithm ALG0{e,6,Si) con- 
verges for all values of e,5, Si and for any G follows imme- 
diately from Lemmas 16.11 16.31 and 16.41 Next, we worry about 
the correctness property. 

The Lemma 16.11 implies that for S ^ 0, the output of 
DESCENT(£, (5), A"^*' -^ A^ where A- is the solution of 
CP(e). Again, as noted in Lemma \6l] A^ — > A* as e ^ 0, 
where A* is an optimal solutioiu of the DUAL. Therefore, 
given (5 > 0, for small enough e > we have 



I '^ij ~ ■^i] I 



< for all {t,j)eE. 

6n 



We will suppose that the e is chosen such. As noted in the 
earlier the algorithm converges for all choices of e. Therefore, 
by Lemma |6T| there exists large enough T such that for t >T, 
we have 

_5_ 
3n 



A* 



A- J < — for all (ij) eE. 



Thus, for t > T we have 



A* 






< 



2(5 

3?! 



forall (i,j) G^. (15) 



Now, recall Lemma [O] It estabHshed that the EST(A*,0) 
produces the correct max. weight independent set as its output 
under hypothesis of Theorem 16.11 Also recall that the algo- 
rithm EST(A 
for {i,j) e E; and (b) whether J2]eAf{ 



0) checks two conditions: (a) whether A* > 

A*,- > Wi. Given 



^There may be multiple dual optima, and in this case A^ may not have a 
unique limit. However, every limit point will be a dual optimum. In that case, 
the same proof still holds; we skip it here to keep arguments simple. 



that the number of nodes and edges are finite, there exists a 
5 such that (a) and (b) are robust to noise of S/n. Therefore, 
by selection of small Si for such choice of S, we find that the 
output of EST(A*, Ji) algorithm will be the same as that of 
EST(A*,0). This completes the proof. ■ 

VII. MAP Estimation as an MWIS Problem 

In this section we show that any MAP estimation problem 
is equivalent to an MWIS problem on a suitably constructed 
graph with node weights. This construction is related to the 
"overcomplete basis" representation [6]. Consider the follow- 
ing canonical MAP estimation problem: suppose we are given 
a distribution q{y) over vectors y = (j/i, . . . , i/m) of variables 
y,„, each of which can take a finite value. Suppose also that q 
factors into a product of strictly positive functions, which we 
find convenient to denote in exponential form: 



9(y) 






Here a specifies the domain of the function 0^, and ya is 
the vector of those variables that are in the domain of 4>a. 
The a's also serve as an index for the functions. A is the 
set of functions. The MAP estimation problem is to find a 
maximizing assignment y* e argmaxyg(y). 

We now build an auxiliary graph G, and assign weights 
to its nodes, such that the MAP estimation problem above is 
equivalent to finding the MWIS of G. There is one node in 
G for each pair (a, ya), where y^ is an assignment (i.e. a set 
of values for the variables) of domain a. We will denote this 
node of G by S{a,ya). 

There is an edge in G between any two nodes (5(ai,y^^) 
and S{a2,ya2) if ^"d only if there exists a variable index m 
such that 

1) m is in both domains, i.e. m G ai and m e a2, and 

2) the corresponding variable assignments are different, i.e. 

In other words, we put an edge between all pairs of nodes that 
correspond to inconsistent assignments. Given this graph G, 
we now assign weights to the nodes. Let c > be any number 
such that c + 4'a{ya) > for all a and y^. The existence of 
such a c follows from the fact that the set of assignments and 
domains is finite. Assign to each node 5{a, ya) a weight of 

C+(^a(ya)- 

Lemma 7.1: Suppose q and G are as above, (a) If y* 
is a MAP estimate of q, let 5* = {5(a,y*)|a G A] 
be the set of nodes in G that correspond to each domain 
being consistent with y*. Then, 5* is an MWIS of G. (b) 
Conversely, suppose 6* is an MWIS of G. Then, for every 
domain a, there is exactly one node J(a,y*) included in 5*. 
Further, the corresponding domain assignmentsjy* | a G A} 
are consistent, and the resulting overall vector y* is a MAP 
estimate of q. 

Proof: A maximal independent set is one in which every 
node is either in the set, or is adjacent to another node that 
is in the set. Since weights are positive, any MWIS has to be 
maximal. For G and q as constructed, it is clear that 
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1) If y is an assignment of variables, consider the corre- 
sponding set of nodes {S{a, Yq) | a G A}. Each domain 
a has exactly one node in this set. Also, this set is an 
independent set in G, because the partial assignments 
Ya for all the nodes are consistent with y, and hence 
with each other This means that there will not be an 
edge in G between any two nodes in the set. 

2) Conversely, if A is a maximal independent set in G, 
then all the sets of partial assignments corresponding to 
each node in A are all consistent with each other, and 
with a global assignment y. 

There is thus a one-to-one correspondence between maximal 
independent sets in G and assignments y. The lemma follows 
from this observation. ■ 

Example 7.1: Let yi and y2 be binary variables with joint 
distribution 

q{yiiV2) = -^ cxp{0iyi + 02y2 + 0i2yiy2) 

where the 9 are any real numbers. The corresponding G is 
shown in the Figure [VTIl Let c be any number such that c+6'i, 
+ 6*2 and c + 6*12 are all greater than 0. The weights on the 
nodes in G are: 6'i + c on node "1" on the left, 62 + for node 
"1" on the right, 612 + c for the node "11", and c for all the 
other nodes. 




Fig. 3. An example of reduction from MAP problem to max. weight 
independent set problem. 



VIII. Discussion 

We believe this paper opens several interesting directions 
for investigation. In general, the exact relationship between 
max-product and linear programming is not well understood. 
Their close similarity for the MWIS problem, along with the 
reduction of MAP estimation to an MWIS problem, suggests 
that the MWIS problem may provide a good first step in an 
investigation of this relationship. 

Our novel message-passing algorithm and the reduction of 
MAP estimation to an MWIS problem immediately yields a 
new message-passing algorithm for general MAP estimation 
problem. It would be interesting to investigate the power of 
this algorithm on more general discrete estimation problems. 
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